Temporal-Difference Methods

Back to Home

01. Introduction
02. Review: MC Control Methods
03. Quiz: MC Control Methods
04. TD Control: Sarsa
05. Quiz: Sarsa
06. TD Control: Q-Learning
07. Quiz: Q-Learning
08. TD Control: Expected Sarsa
09. Quiz: Expected Sarsa
10. TD Control: Theory and Practice
11. OpenAI Gym: CliffWalkingEnv
12. Workspace - Introduction
13. Coding Exercise
14. Workspace
15. Analyzing Performance
16. Quiz: Check Your Understanding
17. Summary

Back to Home

07. Quiz: Q-Learning

Quiz: Q-Learning

Say that an agent is learning to navigate the gridworld described earlier in the lesson.

Suppose the agent is using Q-Learning in its search for the optimal policy, with \alpha=0.1 .

At the end of the 99th episode, the Q-table has the following values:

Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right . As a result, it receives reward -1 , and the next state is state 2 .

In the previous video, you learned that at this point in time, the agent updates the Q-table.

Which entry in the Q-table is updated?

The entry corresponding to state 1 and action left .

The entry corresponding to state 2 and action left .

The entry corresponding to state 1 and action right .

The entry corresponding to state 2 and action right .

SOLUTION:

The entry corresponding to **state 1** and **action right**.

What is the new value in the Q-table corresponding to the state-action pair you selected in the answer to the question above?

( Suppose that when selecting the actions for the first two timesteps in the 100th episode, the agent was following the epsilon-greedy policy with respect to the Q-table, with epsilon = 0.4. )

6.1

6.16

6.2

SOLUTION:

6.2

Next Concept

Learn Udacity: click here to learn more :)